feat(rag): Ask this book — conversational streaming chat + gpt-4.1-mini by mrviduus · Pull Request #385 · mrviduus/textstack

mrviduus · 2026-06-19T06:28:48Z

Turns the single-turn FAQ-style Ask into a real chat — grounding + citations + spoiler gate unchanged.

Model: rag.ask → gpt-4.1-mini (dedicated openai-rag provider, mirrors openai-explain).
Companion prompt: warm + conversational, handles greetings (no more cold 'the excerpts do not contain hi'); grounding is STRICT and overrides everything — every book-fact (incl. themes/symbolism/interpretation) must cite [n]; greeting/meta the only no-cite case.
Multi-turn memory: history (client-held, server-clamped last 6 turns / 4000 chars) → real LlmMessage[]; retrieval per latest question.
SSE streaming (content-negotiated, copies Explain): delta* → terminal done {citations,lastReadOrd,insufficient}; empty chunks → friendly delta + done, no model call (spoiler/hallucination short-circuit on both paths). JSON fallback unchanged. Both catalog + user-book.
Web: token-by-token render + caret, suggested starters on empty thread, history sent, abort-safe. Mobile keeps JSON.

architect → backend + web → adversarial QA (SHIP) — closed a grounding loophole (interpretation/themes weren't bound) + added empty-chunk-no-LLM-call tests. 878 unit + 564 web green; solution + web build clean.

⚠️ The companion prompt loosened the cold-refusal rule — re-run the paid grounding/citation eval on /ai-quality post-deploy (only runtime confirmation of the tightened prompt).

🤖 Generated with Claude Code

Turns the single-turn FAQ into a real chat (grounding + citations + spoiler gate unchanged). - Model: rag.ask -> gpt-4.1-mini (dedicated openai-rag provider, mirrors openai-explain; translate/podcast stay nano). - Companion system prompt: warm, conversational, handles greetings; grounding is STRICT and overrides everything (covers themes/symbolism/interpretation, not just plot) -> every book-fact cites [n]; greeting/meta the only no-cite case. - Multi-turn memory: AskRequest.History (client-held, server-clamped last 6 turns / 4000 chars each); real LlmMessage[] (system -> excerpts -> history -> question); retrieval still per latest question. Eval call passes []. - SSE streaming (content-negotiated, copies Explain): delta* then terminal done {citations,lastReadOrd,insufficient}; empty chunks -> friendly delta + done, NO model call (spoiler/hallucination short-circuit, both paths). JSON fallback unchanged. Both catalog + user-book ask. MaxOutputTokens 320->400. - Web: streaming token render + blinking caret, suggested starters on empty thread, last-6 history sent, abort on new-question/unmount. Mobile keeps JSON. architect -> backend+web -> adversarial QA (SHIP; closed grounding loophole + added empty-chunk-no-LLM-call tests). 878 unit + 564 web green. NOTE: re-run the paid grounding/citation eval on /ai-quality post-deploy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

mrviduus merged commit c80bbd0 into main Jun 19, 2026
5 checks passed

mrviduus deleted the rag-chat-upgrade branch June 19, 2026 06:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rag): Ask this book — conversational streaming chat + gpt-4.1-mini#385

feat(rag): Ask this book — conversational streaming chat + gpt-4.1-mini#385
mrviduus merged 1 commit into
mainfrom
rag-chat-upgrade

mrviduus commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mrviduus commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant